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METRIC AND PROBABILISTIC INFORMATION ASSOCIATED 
WITH FREDHOLM INTEGRAL EQUATIONS OF THE FIRST 

KIND 

ENRICO DE MICHELI AND GIOVANNI ALBERTO VIANO 


Abstract. The problem of evaluating the information associated with Fred¬ 
holm integral equations of the first kind, when the integral operator is self- 
adjoint and compact, is considered here. The data function is assumed to be 
perturbed gently by an additive noise so that it still belongs to the range of 
the operator. First we estimate upper and lower bounds for the e —capacity 
(and then for the metric information), and explicit computations in some spe¬ 
cific cases are given; then the problem is reformulated from a probabilistic 
viewpoint and use is made of the probabilistic information theory. The results 
obtained by these two approaches are then compared. 


1. Introduction 

Let us consider the following class of Fredholm integral equations of the first 
kind: 

Af=g, (1) 

where A : X —> Y is a self-adjoint compact operator, X and Y being the solution 
and the data space, respectively. Hereafter we set X = Y = L 2 [a, b\. 

Solving Equation m presents two problems: 

a) The Range (A) is not closed in the data space Y. Therefore, given an 
arbitrary function g £ Y, it does not follow necessarily that there exists a 
solution f £ X. 

b) Even if two data functions g\ and <72 belong to Range (A), and their distance 
in Y is small, nevertheless the distance between A~ x g\ and A _1 </2 can 
be unlimitedly large, in view of the fact that the inverse of the compact 
operator A is not bounded (X and Y being infinite dimensional space). 

In the numerical applications, g is perturbed by a noise n which can represent either 
round-off numerical error or measurement error if g describes experimental data. 
Assuming in both cases that the perturbation produced by the noise is additive, the 
data function actually known is g = g + n (instead of the noiseless data function 
g). Then, in order to recover / one is forced to use the so-called regularization 
methods ; the literature on these topics is very extensive, and we shall return later 
on this point. 

Since the operator A is self-adjoint it admits a set of eigenfunctions {V’fc}? 0 and, 
accordingly, a countably infinite set of eigenvalues {Afc}j’°. The eigenfunctions form 
an orthonormal basis of the orthogonal complement of the null space of the operator 
A, and therefore an orthonormal basis of L 2 [a,b] when A is injective. For the sake 
of simplicity we consider hereafter only this case. The Hilbert-Schmidt theorem 
guarantees that limfc_oo A& = 0. We shall suppose hereafter that the eigenvalues 
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are ordered as follows: Ai > A 2 > A 3 > • • •; furthermore, we assume for simplicity 
that they are bounded by 1, i.e., Ai < 1. If we consider the noiseless data function 
g , we can associate to the integral equation the eigenfunction expansion 

OO 

f( x ) (2) 

k =1 

where gk = {g^k), (•,■) denoting the scalar product in L 2 [a,b\. The series @ 
converges in the sense of the L 2 -norm. Unfortunately this series is not useful since, 
in practice, the noiseless data function g is unknown. If we take into account the 
additive noise n, instead of Equation ®, we have 

Af + n = g. (3) 

Therefore, instead of expansion we have to deal with an expansion of the type 

Y"fpk(x), gk = {g,ipk), ( 4 ) 

which either diverges if g ^ Range (A), or converges to a function whose distance 
in norm from the true solution / (corresponding to the noiseless data) can be quite 
large. One is then forced to use regularization procedures as mentioned above. 

The mathematical framework outlined so far is only a schematization of reality; 
in particular, if the data g describes experimental data, then it obviously will be an 
element of a finite dimensional space, while the solution / can still be considered 
an element of an infinite-dimensional function space; in general, the data space Y 
and the solution space X may differ. In this case the analysis would require the use 
of singular values and singular functions of the operator A ESCB!, instead of the 
eigenvalues A& and eigenvectors i)>k ■ For the sake of clarity, here it is convenient to 
identify data with an element g of L' 2 [a, b\ and deal with a self-adjoint operator A; 
in this way the analysis is technically simpler, and becomes more transparent for 
our purposes. 

Several methods of regularization have been proposed 0 Ej: all of them modify 
one of the elements of the triplet {A, X, Y} [ 321 . Among these methods, the proce¬ 
dure which is probably the most popular consists in admitting only those solutions 
which belong to a compact subset of the solution space X. The key theorem used in 
this method reads as follows: let a be a continuous map from a compact topological 
space into a Hausdorff topological space; if a is one-to-one, then its inverse map 
g~ x is continuous m The condition of compactness can be realized by the use of 
a-priori bounds HHE2, which require some prior knowledge or some constraints 
on the solution. Then the procedure works by taking into account two bounds, one 
on the solutions and one on the noise n: 

\\Bf\\x < 1 , ( 5 ) 

IMk < £. (6) 

where B is a suitable constraint operator. Let us suppose that the eigenfunctions 
{■0fc}i° diagonalize the operator B*B, i.e., A* A and B*B commute. In such a case 

we have B*Bf = Pkfkipk, where fk = and /3 2 are the eigenvalues of 

B*B. The constraint operator B has compact inverse if and only if linifc^oo = 
+ 00 ; under such a condition, the solution obtained by truncating expansion at 
the largest integer k such that A k ^ e/?fe, converges to the solution /, as e —> 0, 
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in the sense of the L 2 -norm. In several cases a much milder constraint can be 
conveniently used, i.e., B = I (Vfc, [3k = 1)- In this case the compactness condition, 
required by the theorem quoted above, is not satisfied; however, we shall prove in 
Section [ 5 ] that the approximation /* obtained by truncating expansion © at the 
largest k such that Xk ^ e is convergent, though in weak sense, to the solution / 
as e —> 0. 

Hereafter we shall only consider this last truncation method, and we denote by 
/c'o(c) the largest integer k such that A k ^ e; further, we assume that g £ Range [A). 
Since A is compact, Y 0 = Range(H) is a compact subset of Y, and then finite 
coverings of Fo can be constructed. By adopting the language of the communication 
theory IEI> and regarding the inverse problem of approximating / from a given g 
as a communication channel problem, one can compute the maximal length of the 
messages conveyed back from g to /. We are thus led to find a relationship between 
the maximal length of these messages, which is related to the truncation number 
fco(e), and the massiveness (or degree of compactness ) of the set Yq. It turns out 
that the degree of compactness of ho is related to the smoothness of the kernel of 
the integral operator A. In fact, the asymptotic behavior of the eigenvalues Xk, 
for large k, is strictly related to the regularity properties of the kernel: Hille and 
Tamarkin m have systematically explored the relationship between the regularity 
properties of the kernel and the distribution of the eigenvalues of the Fredholm 
integral equation of the first kind. We can say that as the regularity of the kernel 
increases, e.g. passing from the class of functions C° to C°° and then to the 
class of analytic functions, the eigenvalues Xk decrease more and more rapidly for 
k — > oo. Thus the minimum number of balls in a covering of Y$, or the maximum 
number of balls in a packing of Yq EDI, which give a numerical estimate of the 
degree of compactness of Yo, decreases as the smoothness of the kernel increases. 
Finally, the type of restored continuity in reconstructing / from a given g depends 
on the a priori global bounds imposed on the solution (see formula 0 ), and also 
on the degree of compactness of Yq and, accordingly, it is related to the length 
of the messages conveyed back from g to reconstruct /. Since we are concerned 
with the maximal length of these messages we are led to consider a weak-type 
convergence in the reconstruction of the solution /; accordingly we will define ko(e) 
as the largest integer such that Xk ^ £• By adopting a more restrictive constraint 
we could achieve strong-type convergence, but at the same time we would have 
shorter messages conveyed back from g for reconstructing /. 

The problem of reconstructing / from g can be reformulated as well in proba¬ 
bilistic terms, in view of the fact that the data function g is perturbed by the noise 
n, which can be properly regarded as a random variable. With this in mind one 
can rewrite equation © in probabilistic form as 

M + C = v, (7) 

where £, £ and g, which correspond to /, n and g respectively, are Gaussian weak 
random variables jT] in the Hilbert space L 2 [a , b ]. Next, Equation 0 can be turned 
into an infinite sequence of one-dimensional equations by means of orthogonal 
projections, i.e., 

A kf,k Y Ck Vk, k — 1,2,..., (8) 

where £& = (t;,ipk), C fc = (C>^fc)> Vk = (v^k) are Gaussian random variables. 
Using this approach it is possible to evaluate the amount of information J(£,k,Vk) 
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about the variable which is contained in the variable r]k • From this approach 
then another method of truncation emerges, which is based on neglecting all those 
components for which </(£&, 77 *) is less than i In 2. As illustrated in Section[3 this 
criterion leads to a truncation number which is very close to the number fco(e) 
introduced previously. One can thus conclude that the two procedures, the de¬ 
terministic one, based on the evaluation of the maximal length of the messages 
conveyed back from g to /, and the probabilistic one, based on the information 
theory, yield essentially the same result. 

Information theory, or the theory of coding arose from the fundamental pa¬ 
per of Shannon in 1948 E2 Perhaps it should be more correctly referred to as 
statistical communication theory. The information source is any producer of infor¬ 
mation according to some known probability law, and this information has to be 
communicated to the destination by means of a transmission channel. Noise can be 
regarded as anything which impairs the ability of the channel to transmit with com¬ 
plete reliability. Information theory is concerned with the methods for achieving 
high reliability without reducing the transmission rate too drastically. Successively 
the mathematical theory of information was extended by several authors, notably 
Kolmogorov and Gelfand (see, in particular, H and the papers quoted therein). 
One question quite naturally arises: On the one hand information theory is formu¬ 
lated in the framework and uses language and tools of the probability theory, on 
the other hand the concept of information can be thought of as more basic and 
independent of probability m Then the problem becomes: how to construct a 
nonprobabilistic theory of information. To this purpose Kolmogorov and his school 
introduced and developed an alternative approach to the quantitative definition of 
information, which is logically independent of probabilistic assumptions: the mea¬ 
sure of information is given in purely combinatorial terms m This combinatorial, 
or metric, approach finally results in the theory of the e-entropy and e-capacity of 
sets in metric spaces M- 

The connection between ideas and concepts of Shannon’s information theory, 
with particular attention to the notion of length of a message in binary units , and 
those of e-entropy and e-capacity are illustrated in detail in HI: to which the 
interested reader is referred (to this purpose, let us also mention m, where the 
e-entropy plays a crucial role in connection with empirical processes estimation). 
With a small abuse of language we call metric information that induced by the 
e-capacity, which is, indeed, defined as the number of binary signs that can be 
reliably transmitted. Finally, the problem of comparing the results of probabilistic 
and nonprobabilistic, or metric, information theory remains. The main aim of this 
paper consists precisely in trying to give a partial answer to this question in the 
specific case of Fredholm integral equations of the first kind. 

The paper is organized as follows. In Section Owe first prove that the approx¬ 
imation /* converges weakly to / as e —> 0. Then we find an upper and a lower 
bound for the e-entropy associated with the mapping of the unit ball, in the solu¬ 
tion space, induced by the operator A. Next, we evaluate explicitly an upper bound 
for the maximal length of the messages conveyed back from g to reconstruct /, and 
this provides an estimate of what we call metric information. Explicit calculations 
are given in three specific cases: harmonic continuation, backward solution of the 
heat equation, first kind Fredholm integral equation with continuous kernel. In 
Section [ 3 ] we reconsider the problem from a probabilistic viewpoint. We introduce 
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another truncation method based on probabilistic information theory, and accord¬ 
ingly we derive an approximation which converges to the solution, in the sense of 
the probabilistic theory, under suitable conditions on the covariance operator of the 
solution. 


2. Metric Information Associated with Fredholm Integral Equations 

of the First Kind 


2.1. Weak convergence of the f, approximation. Let us consider the approx¬ 
imation /* = where fc 0 (e) is the largest integer such that A*, ^ e. 

We want to prove the weak convergence of /* to / as s —*• 0 and, accordingly, the 
weak continuity in the restored solution; for this purpose we need the following 
auxiliary lemma. 


Lemma 1. For any function f which satisfies the following bounds 


IIAf - 3 lr=L 2 [a,6] ^ 



(9) 

/ A'=L 2 [a,h] ^ 

1, 


(10) 

ffce following inequalities hold: 





< 

V2e, 

(11) 

1/ — /*llx 

< 

V2, 

(12) 

W-/*)||y + Ml/-/*llx 

< 

4s 2 . 

(13) 


Proof, (a) From the inequality A& < e for k > fco and the bound ||/||x < 1 it 
follows: 

OO 

E X l\h\ 2 < £ 2 - ( 14 ) 

k=ko~\~l 

From ||Af — g\\y < £ we get: 


E a : 

fc=i 


fk - 


9k_ 

A k 


<£ 2 . 


Therefore we have 


fco 


P(/-/*)iiv = E A 


k =1 


r 9k 
Jk x 
^k 


E A fei/fci 2 < 2e2 > 

k=ko-\-l 


(15) 


(16) 


and inequality CD is proved. 

(b) From the inequality A*, ^ s for k ^ fco and the bound || Af — 5 ||y ^ £ we obtain 


fc 0 

E 

fc=i 


fk 


From \\f\\x ^ 1 it follows: 


Therefore we have: 


11/-Mix 


9k_ 
A k 


ko 

E 7T I A fc/fc - 9k\~ < 1- 

fc=i Afc 


E IM 2 <!- 

k=ko-\-l 


k 0 

E 

fc=i 


fk 


9k_ 

A k 


OO 

E i/fci 2 ^ 2 * 

k=ko~\~l 


(17) 


(18) 


(19) 
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and inequality m is proved. Next, from m and m we obtain: 

WMf ~ f*)\\l + £* \\f ~ f*\\x ^ 4 e 2 , ( 20 ) 

that is, inequality O- □ 

Let us note that lim £ ^o/co(£) = +oo. The latter equality follows from the 
definition itself of ko(e) and from the fact that lim/^oo A& = 0. Next we prove the 
following theorem. 

Theorem 1. For any function f which satisfies bounds © and m- the following 
limit holds true: 


lim (/- f*,v) x =0: Vu G X; ||u||x 1. 

Proof. Let us put: Xk = fk — (f*)k ; then we have: 

oo / oo \ 

(/-/». v)x = Y XkVk ’ ( E l ' yfc | 2 ^ 1 ) ■ 


k=l 


Kk= 1 


Next, by the Schwarz inequality and bound <X2). we have: 


K/ - f*,v) x i < E \ xkVk \ = Y 


k =1 


k=l 


A l + e' 

X 2 k +e 2 


2\ V2 


\XkVk\ 


< (ll Mf ~ f*)\\Y + e2 11 / - f*\\x) 7 ( Y 

\fc=1 


M 2 


1/2 


— i 


A 2 + e 2 




4e2 E 


ki 2 


1/2 


t , A 2 +£ 2 

k =1 K / 


Next we split the sum Y^kLi k| 2 /(A 1 + £ 2 ) into two parts, i.e., 

k ° l- 12 i„„ |2 

\2 _u c2 ' 


ST' \ v k\ 2 
Z-j \2 J_ c 2 




A? + 

fe=fc 0 +i *■ 


The first term of the sum iTHli can be majorized as follows: 


k 0 


kl ^ 1 v I- I 2 ^ 1 

Y \2 , -2 ^ Op2 zE’ fc l ^ 0,2- 


— a i + £ [ 

k =1 ^ 


fc=l 


From formulae 1231 and (E31 we have 


fc 0 


4£2 Ero< 2 f>i 2 < 2 - 


- 2 U . 12 //\2 , <--2 


fc=l 


Furthermore, lim £ ^o ^ |^a;| 2 /(A| + e 2 ) = 0 for A; < &o. Therefore we have 

-° h x i +s 

Let us now consider the second term of sum J23J; we can write 

12 1 ~ 


E 


Vk 


E ki 2 - 


A 2 + £ 2 £ 

o-i-i K fc=fe 0 +1 


( 21 ) 

( 22 ) 

(23) 

(24) 

(25) 

(26) 

(27) 


(28) 
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Therefore from formulae (® and (|28|) we get 


E 


\Vk? 


A 2 4- f 2 
k=k 0 +1 k 


< 4 y h 2 


PC 0 -t-i fc=fco + l 

Then, taking into account that lim e ^o fco(e) = +oo, we can conclude: 


lim 4e 2 y 

e —>0 ' 


M 2 


A 2 4- e 2 

fc=fco+l A k + £ 


0. 


From <EU and we then obtain: 

lim(/ - f*,v) x = 0. Vu G X; ||t>||x < 1, 

and the theorem is proved. 


(29) 


(30) 


(31) 

□ 


2.2. e—entropy and e—capacity associated with the operator A. Let us con¬ 
sider the unit ball in the solution space X = L 2 [a,b], i.e., {/ £ X\ \\f\\x ^ 1}- The 
operator A maps the unit ball onto a compact ellipsoid £ £ Range (A) contained 
in Y = L 2 [a , 6], whose semi -axes’ lengths are the eigenvalues A& of the operator A. 
In order to give a numerical estimate of the massiveness of the set £, let us first 
recall some basic definitions mm- 

(a) A family Yy , ■ ■ ■ ,Y n of subsets of Y is an e-covering of £ if the diameter of 
each Yk does not exceed 2e and if the sets Y) ; cover £: £ C U^ =1 Yfc. 

(b) Points y\, ■ • ■ , y m of £ are called e-distinguishable if the distance between 
each two of them exceeds e. 

Since £ is compact, then there exists a finite e-covering for each e > 0 and, more¬ 
over, £ can contain only finitely many e-distinguishable points. For a given e > 0, 
the number n of sets Yj, in a covering family depends on the family, but the minimal 
value of n, N s (£) = min n, is an invariant of the set £, which depends only on e. Its 
logarithm (throughout the paper log x will always denote the logarithm of the num¬ 
ber x to the base 2), that is, the function H e {£) = log A e (£) is the e-entropy of the 
set £. Analogously, the number m in definition (b) depends on the choice of points, 
but its maximum M £ (£) = max to is an invariant of the set £. Its logarithm, that is 
the function C E (£) = log M e (£) is called the e-capacity of the set £. This quantity 
represents the maximum number of e-distinguishable signals that can be received, 
that is those data which satisfy the following inequalities > e, for all 

i ¥= k, g w , <7 (fe) e £. 

A general result about e-entropy and e-capacity are the following inequalities 

m- 

H E (£) < C e (£) < F e/2 (£). (32) 

To obtain estimates for the e-capacity C e (£), our aim now is to look for a lower 
bound for H e (£) and an upper bound for H e / 2 (£). For this purpose, let us consider 
the finite dimensional subspace Yfc 0 of Y spanned by the first fco axes of £, and 
put £fc 0 = £ D Yk 0 . Then £fc 0 is a finite dimensional ellipsoid whose volume is just 
nti times the volume f lk 0 of the unit ball in Yf~ 0 . Since the volume of an e-ball 
in Yk 0 is just e k °flk 0 , we see that in order to cover the ellipsoid £ by e-balls we 
shall need at least ritli suc h balls. From this it follows that: 

ko , 

n y < ^( £ )> 

k= 1 


( 33 ) 
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and therefore we have the following lower bound for the e-entropy H e {E): 


fco a 

V log log N e (E) = H e (E). 

ti £ 


(34) 


An upper bound for H e / 2 (£) can be found in the following way 7, OjSj: Let us 
construct in Y/ 0 the cubical lattice with mesh width E\ = e/(2\/fco), and with 
coordinate axes the axes of £fc 0 . In view of the choice of ei any point of Yk 0 , and 
in particular of £fc 0 , lies within a distance not exceeding ^£\\fko = (e/4) from the 
nearest point of this lattice. In particular, it will lie at a distance not exceeding 
(e/4) from one of the lattice points which are contained in the parallelepiped Pk 0 
defined by: 

£ £ 

— \k ^ ^ ^ 4“ A k-t 1 ^ fc ^ fco- (35) 

Now, if fc 0 = fco (e/4), that is fco represents the number of terms in the sequence 
{Afc} which are greater than (e/4), then every point x £ £ lies at a distance not 
exceeding (e/4) from a point of £fc 0 . In fact, let us write x = 'f2 k Xk^k, {V’fc} being 
the orthonormal basis for Y made of the eigenvectors of the operator A. Since x 
belongs to £, then evidently Y^kLi l^fe/A*/ 2 < 1. Hence the square of the distance 
from x to £fc 0 is 


d ( x , £fc 0 ) 


E 

k=ko~\~l 


M 2 = V 


k=ko-\-l 


A l 


Xk_ 
^k 


^ A feo+i E 

&=i 




(j) a - (“> 


Now, the balls of radius (e/2) with centers at those lattice points within Pk 0 cover 
the ellipsoid £. In fact, from m each point of £ is at a distance not exceeding 
(e/4) from £fc 0 , and each point of £fc 0 is at a distance not exceeding (e/4) from 
some point of the lattice belonging to Pk 0 ; then each point of £ lies at a distance 
not exceeding (e/2) from some point of the lattice belonging to Pk 0 . Obviously the 
number of lattice points in Pk 0 is not greater than 



(37) 


where we used the assumption e < Ai ^ 1 ^ fco. Then the number of elements in 

fco(e/4) 


this e-covering is no more than 
the logarithm, we finally obtain 


6 i/fc 0 (e/4)/e 


since fco = fco (e/4). Taking 


H e/2 (£) fco (0 log 


6\Ao(e/4) 


= fco 


: °(l) lo s(-j + log6 + -logfc 0 (|) 


(38) 

For the next step we note that H e (E) is a nondecreasing function as e —» 0, then 
we can introduce the order of growth p(£) of the entropy H e (E) as follows: 


r log H e (E) 

p[ c) = lim sup -—-—. 
J s^o p log(l/e) ’ 


(39) 


or, in the case p(£) = 0, the logarithmic order of growth ct(£) of H e (E) which reads 


<t(£) = lim sup 


logg e (£) 
log log(l/e)' 


(40) 
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Since we are interested in relating the asymptotic behavior of iJ E (£) as e —> 0 with 
the asymptotic behavior of the semi-axes {Afc} of £ as k —> oo, we are led to intro¬ 
duce the exponent of convergence X and the logarithmic exponent of convergence p 
of the sequence {1/Afc}, see m-- 


log k 0 (e) 
limsup , , . , 

log(l/e) 

(41) 

log ko(s) 

£—>o loglog(l/e) 

(42) 


where ko(e) denotes the number of elements of the sequence A k which are greater 
than e. The following relationship is proved in [D)i: p(£) = A, and if p(£) = A = 0, 
then cr(£) = p + 1. Finally, we can define the degree of compactness d c associated 
with the range of the operator A as d c = 1/p (if p ^ 0), and the exponential degree 
of compactness of Range (A) as df = 2 1 / CT (if p = 0). 

By using bounds 01 and 11351) . we can now evaluate the degree of compactness 
of Range (A) in three specific examples: harmonic continuation, backward solution 
of the heat equation, and a convolution equation with continuous kernel; in all these 
examples the behavior with k of the eigenvalues is uniform, in the sense that the 
relative rate of decaying of the eigenvalues follows, for all fc, a uniform law in k. 

2.2.1. Harmonic continuation. Let us consider a family IF of functions u(r, 9) which 
satisfy the Laplace equation at the interior of the unit disk. We want to determine 
u(b,9), (b < 1), assuming that u(a,9) (a < b) is known within a certain approxi¬ 
mation. The solution to the problem is obtained by solving the following integral 
equation of Fredholm-type: 

u(a,9) = — [ P{9 — <f) u(b, </>) df> 1 ~tt<9^tt, (43) 

2tt J o 

where P{9 — <j>) is the Poisson kernel given by: 

+°° , , . 

P(<9 -</>)= (l) e^ 6 -^. (44) 

k=— oo 

We can put Equation m into the form 0: Af = g, where /(</>) = u(b,<f>), 
g{0) = u(a,9), (b > a); u(b,<f>) is the restriction to the circle of radius b of a 
function harmonic in the unit disk, which belongs to L 2 [— n, 7r]; then the following 
expansion converges in the sense of the L 2 norm: 

+oo / +oo \ 

u(l,9)= ^2 u k e lke , f | u k | 2 < oo ) . (45) 

k=— oo \k=— oo / 

Furthermore, we have: 

+oo 

u(b,9)= J2 b W u k e ike , (46) 

k=— oo 

which is uniformly convergent. The eigenvalues of the operator A are X k = [a/b )' k ', 
b > a, and the eigenfunctions are given by ipk(9) = e~‘ k6 \ evidently, lim^oo X k = 0. 
The Range (A) is not closed in L 2 [— 7r, tt]; in fact, only those functions u which 
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satisfy the following bound: 

+oo „ 

Y ['uka |fc| ) < oo, 

k=—oo 


( 47 ) 


belong to the Range (A). Now, if a noise n is added to the data function g, the 
function actually known is g = g+n which, in general, does not belong to Range (A); 
nevertheless hereafter we still assume that g £ Range (A). Next we restrict the 
solution space to those functions which satisfy the following bound: 

+oo 2 

Y ( u kb W ) <1. (48) 

k ——oo 


It is now easy to evaluate the truncation number ko(e), which is given by the largest 
integer such that Afc ^ e, i.e., 


k 0 (s) 


log(Ve) 

log(6/a) 


(49) 


where [•] stands for the integral part. Now we split the sums (gH)-® into two 
parts: the first is obtained by varying k from zero to +oo; the second by varying 
k from —1 to — oo. We denote the e-entropy (e-capacity) associated with the 
truncation of the first sum by Hs + \£) (Ci + ^(£)); accordingly, the e-entropy (e- 
capacity) associated with the truncation of the second sum by (C'i _ ' ) (£)). 

Then using formula (THTll and inequality (El we obtain: 


k 0 (e) 


Y log 


k—1 


A k 


<ffi+)(£)<Ci+)(£) 


< H S-2 < k ° (f) l0g (7) + l0g6 + \ 1o S k ° (7) 


€ 


2 + log(l/e) 


log f + log 6 + i log k 0 (!) 


log(&/a) 

The leading term on the r.h.s. of as e —> 0 is given by 


log ( - ) ~ k 0 (e)\og ( - 


log(l/e) 
log (b/a) 

while the leading term on the l.h.s. of becomes 


-k 0 (e) log - 


(50) 


(51) 


(52) 


We thus obtain, for e sufficiently small, fairly sharp inequalities for the e-capacity: 


log (1) < G +) (£) < * 0 ( 0 ) log (I) < (53, 

We thus have an upper bound for the maximal length, in binary units, of the 
messages conveyed back from g to reconstruct /, associated with the truncation of 
the positive sum; we obtain, with obvious notation: 

jJ,+) (e) < 2 fc °( e ) lo s( 1 / e ) ^ 2l 1 ° s2 l 1 / £ ) //log l b / a H. 

max V / 


( 54 ) 
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Finally, for the total maximal length we obtain: 

Lm ax(e) = L ^( S ) + L (~ l ( e ) < 2 M £ )log(l/e) + l 
^ 2 k o(s)^og(i/s) ^ 2( lo ^ 2 ( 1 / e )/ lo s( b / a )) 

e—>0 

which can be taken as a quantitative estimate of the metric information. 

Remark. Let us note that log(6/a) = Cons. • L{C}, where L{C} is the extremal 
length of {C}, the latter expressing the set of curves in the ring domain 0 < a < 
r < b < oo, which join r = a to r = b. L{C} is a conformal invariant jJJ. The r.h.s. 
of m may be regarded as a particular case of a more general result due to Erohin 
(see tm which shows that for general sets of analytic functions: 

Hj+)~c!+)~ 7 log 2 (d), (56) 

7 depending on some conformal invariant. 

Concerning the order of growth p(£) of the £-entropy and the exponent of con¬ 
vergence A: from m it follows that /?(£) = A = 0. We then move on to the 
logarithmic order of growth cr(£) and, correspondingly, to the logarithmic exponent 
of convergence p; we have <r(£) = 2 and, consequently, the exponential degree of 
compactness df = 2 1 / <T = 2 1 / 2 . 

2.2.2. Backward solution of the heat equation. Let us consider a heat conducting 
ring of radius 1. One can pose two problems: 

i) Direct problem. Determine the temperature distribution h(t , 0) 
t, when h{ 0, 9) is given. The solution is obtained by solving the 
problem for the heat equation: 

h t = Dh ge , D> 0, 

h(0,9) = h 0 (9 ), O<0<2tt. 

ii) Inverse problem. Determine the temperature distribution h(b 1 9) = 
time t = b, when /i(a, 9) = g(9), a > b, is given. 

The solution to the inverse problem is obtained by solving the Fredholm 
equation of the first kind: 

h(a, 9) = g{9) = J X(9 - </>)/(</>) #, 
where the kernel X(9 — is the elliptic Jacobi theta function: 

+oo 

k— — oo 

The eigenfunctions and the eigenvalues of the integral operator A are respectively 
i>k(9) = e ~ lk0 , A k = exp {—Dk 2 {a — b))\ moreover, lim^oo Xk = 0. Once again we 
assume that the solution and the data space X and Y are both L 2 [—tt, 7t]. We may 
now consider the following expansion 

+oo 

h(t,0)= Y h k e~ DkH e ik0 , 

k=—oc 

which converges in the sense of the L-norm. 


at time 
Cauchy 

(57) 

(58) 
f(9), at 

integral 

(59) 


(60) 


( 61 ) 
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Again the Range (A) is not closed in L 2 [—7 r, 7r]; in fact only those functions h 
which satisfy the following bound: 

+OO 




< oo, 


(62) 


k=—c 


belong to Range (A). If a noise n is added to the data function g, only the function 
g = g + n is known and, in general, it does not belong to Range (A). Nevertheless 
we assume even in this case that g £ Range (A). Next we restrict the solution 
space to a subspace composed of those functions which satisfy the following a- 
priori constraint: 

+oo 


\h 


J2 (h k e~ Dk2 ^ <1. (63) 

k— — oo 

The truncation number ko(e), which is given by the largest integer such that X k ^ e 
can be easily evaluated, i.e., 


k 0 (e) = 


( log(l/g) V /2 
\D(a-b)J 


(64) 


Based on considerations analogous to those developed in the case of harmonic con¬ 
tinuation, and by splitting the sums E3 (EH into two sums as done before, we 
obtain: 

MO 

e J - • V4/ \e, 

'2 + log(l/e) x 1/2 


J2 lQ g (y) < C e +) ( £ ) < k 0 (|) log + log 6 + ^ log k 0 (0 


A:—1 


(65) 


€ 


D(b~a)V r g u; +log6+ ^ 

The leading term on the r.h.s. of ©, as e —> 0, is given by 

fco(g) log(l/e), 


!°gfco(0 


/ iog (1 / g ) y /2 / 1 

\D(a-b)J S U 

while the leading term on the l.h.s. of innt , as £ —> 0, is 


( 66 ) 


(67) 


( 68 ) 


1 - — iog ej k 0 (e) log(l/e). 

We therefore have quite sharp bounds on the e-capacity, i.e., 

1 - 7T lo g e ) k °( £ ) ^(V 2 ) ~ Ci +) (£) ^ ko(£) log(l/e). 

Then, we have an upper bound for the maximal length, in binary units, of the 
messages conveyed back from the data for reconstructing the solution, i.e., 

ima } x(e) $ 2 k °V < 2 (^7T [los(1/e)]3/2 . (69) 

Then the final result referring to the total maximal length of the messages is: 


imax(e) = iLU(g) + Z 2 k o(^^(l/s)+l tfoie) log(l/e) 

£—^0 


2 lSw ^ [log(1/£)l3/2 


( 70 ) 


Accordingly, the exponential degree of compactness is given by df = 2 2 / 3 . 
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2.2.3. First kind Fredholm integral equation with continuous kernels. Let us con¬ 
sider the following Fredholm integral equation of the first kind: 

Af = f X{x, y) f{y)dy = g{x), (71) 

Jo 

where the kernel X(x 7 y) is the continuous function 

X(x,y) = (1 -x)y, 0 < y < x < 1, (72) 

X(x,y) = x(l-y), 0 < x < y < 1. (73) 

Eigenfunctions and eigenvalues of operator A in Equation o can be easily eval¬ 
uated: the eigenvalues are: A k = l/(fc 2 7 r 2 ). Once again, following considerations 
analogous to those developed in the previous examples we obtain (e) = [ 1 / {n^/e)} 
and, for e sufficiently small, (21oge)fc 0 (£) < C e < \k 0 {s) log(l/£). Consequently, 
we have p = k, d c = 2 and 

L m ax(£) < 2 fco(e)log(1/E) < 2 1/(7r ' /F)los(1/E) . (74) 

Remark. With reference to this last example, the reader interested in sharp bounds 
on the s-capacity in the general setting of Sobolev spaces is referred to 01 (see also 
Section 6 of El). 

Summarizing, we have the following table: 


Behavior of A fc 

log L ma , x (e) 

d c 

d e c 

e~ Clk 

c'i [log(l/e)] 2 


2 1 / 2 

— c 2 k 2 
e z 

4 [log(l/e)] 3/2 


2 2 / 3 

Co Ik 2 

4 c~ 1/2 log(l/£) 

2 



3. Probabilistic Information 

Here we want to reconsider Equation 0 from a probabilistic point of view, 
adding explicitly the term representing the noise. With this in mind we pass from 
Equation 0 to Equation m, and then to the probabilistic form of the latter, i.e., 
Equation 0 , where £, £ and rj are Gaussian weak random variables (w.r.v.) in the 
Hilbert space L 2 [a , 6 ] (TJ. A Gaussian w.r.v. is uniquely defined by its mean element 
and its covariance operator; in the present case we denote by R ^ and R vv 

the covariance operators of £, £ and r) respectively. Next, we make the following 
assumptions: 

i) f and £ have zero mean, i.e., m £ = m f = 0 ; 

ii) £ and £ are uncorrelated: i.e, R^ — 0 ; 

iii) AT . 1 exists. 

Regarding assumption (i), if it is known that m £ 7 ^ 0 and m £ 7 ^ 0, then the problem 
can be easily reformulated in terms of the variables (£ — to^) and (£ — m^). The 
second hypothesis simply states that the signal-process £ and the noise-process ( 
are independent. Finally, the third assumption is the mathematical formulation 
of the fact that all the components of the data function are affected by noise or, 
in other words, that no components of the noise is equal to zero with probability 
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one. As shown by Franklin, see formula (3.11) of if assumptions (i) and (ii) are 
satisfied, then 

Rrjrj = ARtfA* + (75) 

and the cross-covariance operator is given by: 

Rir, = Rut a*. (76) 

We also assume that depends on a parameter e that tends to zero when the 
noise vanishes, i.e., 

R(C = s 2 N, (77) 

where A" is a given operator, e.g., N = I for the white noise. 

Now, we are faced with the following problem: 

Problem. Given a value g of the w.r.v. 77 find an estimate of the w.r.v. £. 

In order to give an answer to this problem, we turn Equation © into an infi¬ 
nite sequence of one-dimensional equations by means of the orthogonal projections, 
obtaining Equations ©, where = (£,V> k), Cfc = (C,^fc), Vk = fa.V’k) are Gauss¬ 
ian random variables. Accordingly we introduce the variances p 2 = (i?jj^fc»V'fc); 
e 2 i% = (R^k,fpk), A Ipl + e 2 v\ = {R vv ^k, 4>k)- Next we evaluate the amount of 
information on the variable which is contained in the variable 7 ^; we have jSJ: 

J(£k,Vk) = -iln(l-r£), (78) 

where 

2 | E{£fc77fc}| 2 _ (A kPk) 

k ~ E{|a| 2 }E{|77 fc | 2 } ~ (A kPk ) 2 + (eu k )2- 1 j 

Thus 

m,Vk) = \ ln(l + ^J). (80) 

From equality it follows that J(^fc> ? 7fc) < 1 h 2, if XkPk < EJ'fc, that is if the 
signal-to-noise ratio of the k th component is small. Thus, we are naturally led 
to introduce the following two sets: one, denoted by J, which accounts for the 
components in which the signal dominates the noise; the other one, denoted by N, 
which is instead related to the components in which the noise prevails; precisely, 
we define: 


J = {k : A kPk > £Vk} , (81) 

N = {k : A k Pk < £v k ] ■ (82) 

Remark. Let us note that the sets J and N are not equipped, in general, with any 
order relation. However, we can rearrange and renumber the terms A kPk and evk 
in such a way as to introduce an order relationship. Furthermore, for the sake of 
simplicity and without loss of generality, we hereafter assume that there do not 
exist two identical terms A kPk/vk corresponding to different values of k. In this 
situation there exists a unique value of k, denoted by fcj, which separates set J from 
set N. 
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Since £k and £k are supposed to be Gaussian random variables, we can assume 
the following probability densities: 

k ‘ (x) = * =i ’ 2 -’ (s3) 
Pa(x) = 71 t=1 - 2 -" (84) 

By equations © we can also introduce the conditional probability density p Vk (y\x) 

of the random variable 77 *, for fixed = x , which reads: 

( 1 ^ 1 / (i/-A fe a:) 2 \ 

p ’' (!,|l) = 71777 exp r^vTJ 

. f X2 / N 2) (85) 

= ^ 777 exp {“}' 

Let us now apply the Bayes formula, which provides the conditional probability 
density of given r\k through the following expression: 

_ /_i.a Pik(x)p Vk (y\x) , oc , 

kMv) - M.) ' (86) 

Thus, if a realization of the random variable rjk is given by gk , formula becomes 


(« = exp { -^ } exp I - ^ (x - g) I, A fe =Cons.. (87) 

The conditional probability density EJ can be regarded as the product of two 
Gaussian probability densities: 

Pi C x ) = exp { -x 2 /2p 2 k } , 

p 2 (a;) = ^i 2) exp|-(A^/2e 2 ^)(x-(5fc/A fc )) 2 |, 

Afe = whose variances are respectively given by p\ and (eVk/Xk) 2 - Let us 

note that if k £ J, the variance associated with the density p 2 (a;) is smaller than the 
corresponding variance of pi(x), and vice versa if k £ [NT. Therefore, it is reasonable 
to consider as an acceptable approximation of (£*,) the mean value given by the 
density p 2 (x) if k ^ fc/ (i.e., if A; € J), whereas the mean value given by the density 
Pi(x) if k > ki (i.e., if k £ N). We can write the following approximation: 


(fk) 



: k ^ fcj, 
if k > kj. 


( 88 ) 


Consequently, given the value g of the w.r.v. 77 , we are led to consider the fol¬ 
lowing estimate of f: XZfcea= £/• Next, we introduce the operator 
Bj : L 2 [a,b] —> L 2 [a,b], defined as follows 


Bj4>k 


—4>k if k ^ ki , 

Afc 

0 if k > ki, 


(89) 
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then £/ = Bjg = Y^kLi (fffcMOV’fc- We can now evaluate the global mean square 
error; taking into account formulae m and ca, we can formally write: 


E{||C — B,v\\ 2 } = Tr 0% - R a AB* - BAR a + BR VV B*) 



2 


(90) 


The sum is finite if and only if Tt R^ = Ea—i Pfc < oo, i.e., if the covariance 
operator R is of trace class. In the following we assume that this condition is 
satisfied. Hereafter we also suppose that limfe^ 00 (A kPk/vk) = 0, and therefore the 
set 0 exists and its cardinality is finite for any given e > 0. Next, we prove the 
following lemma. 

Lemma 2. If TrR = T < oo and moreover lim/ c _ >00 (AfcPfc/t'fc) = 0, then we can 
introduce a number k a (e) defined as follows: 



(91) 


(92) 


(93) 


Proof, (i) Let us denote by k ai the sum k a + 1. If equality is not true, then 
there should exist a finite number M, which does not depend on e and such that, 
for any sequence {£i} converging to zero, k ai < M. From formula m it then 
follows: 


r < 



(94) 


For any sequence {£i} tending to zero, we have 


M 


oo 



(95) 


k= 1 fc=1 


and the contradiction is explicit. 

(ii) Since lim^o k a (e) = +oo, and EfcLi Pk < °°> then 


OO 



(96) 


k=k a (e) +1 


Regarding the term '}2k=i\ £V 'k /Afc) 2 , we can proceed as follows: from formula dTTTl) 
we have 



( 97 ) 


and therefore 


k a (e) 


2 


oo 
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Since lim £ ^ 0 X)fcLfc Q + 1 Pk = 0 ( see ®l), we have lim £ ^ 0 Y^k=i\ EV k/^k) 2 =0. □ 


Finally, we can prove the following theorem. 

Theorem 2. If the covariance operator is of trace class, and lim £ _ >0 A kPk/vk = 
0, then the following limit holds true: 



Proof. The proof proceeds in two steps. 

a) We want to prove that lim £ ^o YhkLkr+i Pk = 0- We have two possibilities: either 
fcj ^ k a , or kj < k a . In the former case the statement follows from the fact that 
lim £ ^o Y^kLk +i Pk = 0- In the latter case, if ki < k a , then we have: 


OO 


E A 

k=ki(e )+1 




( 100 ) 


But in Lemma |3 we have proved that the r.h.s. of formula (tTUUl tends to zero as 
s —> 0, and the statement follows. 

b) We want to prove that lim £ ^ 0 Y^k=i ( ei/ fc/Afc) 2 = 0. Now again either kj ^ k a or 
ki > k a . In the first case the statement follows from lim £ ^ 0 Y^k=l ( £U k/^k) 2 = 0, 
as proved in Lemma |3 If, on the contrary, kj > k a , then we have, for k ^ kj, 
Pk Js e^fe/Afc, and therefore 



fcj 

< pi 

k—k c -\-1 


OO 

< E A 

k=k a +1 


Since lim £ ^ 0 YlkLk a +1 Pk = 0, it follows that 



( 101 ) 


( 102 ) 


Now the statement follows recalling that lim £ ^ 0 Y^k°Li( £V k/^k) 2 = 0, as proved in 
Lemma □ □ 


If we now sum up the information carried by the set {pk\k& on the corresponding 
set {£fc}fcej we obtain the quantity: 


ki ki 

^2 


k =1 


k= 1 



^kPk 
evk 


(103) 


which could be called the probabilistic information associated with equation 0. 
For the approximation on the r.h.s. of 00) we used A kPk ^ £ Vk for feel Now, 
in order to compare the probabilistic information with the metric information , we 
may consider two somehow extremal approximations: 
a) If pk ~ izfc, k £ 3, we have 




A kpk 

£ Vk 



( 104 ) 
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since ki = ko. Let us note that the r.h.s. of formula cm coincides (up to 
an immaterial conversion factor between logarithm types) with the lower 
bound for H e (£). 

(3) If A kPk ~ Vk, k £ J, we have 


E ln 


evk 


ki (e) In 



ko(e) In 



(105) 


which coincides with the upper bound for /f £ / 2 (£)5 which we have computed 
in the various examples of the previous section. 

It is interesting to note that the metric information provides the limits of the 
range over which the probabilistic information varies when the signal-to-noise ratio 
ranges between the extrema given by the two previous approximations. The results 
given by approximations (a) and (/?) allow us to look at the analogy and parallelism 
between metric and probabilistic information on a more precise and quantitative 
ground. 
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